Texture Cache Approximation on GPUs
نویسندگان
چکیده
We present texture cache approximation as a method for using existing hardware on GPUs to eliminate costly global memory accesses. We develop a technique for using a GPU’s texture fetch units to generate approximate values, and argue that this technique is applicable to a wide variety of GPU kernels. Applying texture cache approximation to an image blur kernel on an NVIDIA 780GTX, we obtain a 12% reduction in kernel execution time while only introducing 0.4% output error in the final image.
منابع مشابه
Fusion Coherence: Scalable Cache Coherence for Heterogeneous Kilo-Core System
Future heterogeneous systems will integrate CPUs and GPUs on a single chip to achieve high computing performance as well as high throughput. In general, it would discard the current discrete pattern and will build a uniformed shared memory system avoiding explicit data movement among CPUs and GPUs connected by high throughput NoC. We propose a scalable cache coherence solution Fusion Coherence ...
متن کاملCache-efficient numerical algorithms using graphics hardware
We present cache-efficient algorithms for scientific computations using graphics processing units (GPUs). Our approach is based on mapping the nested loops in the numerical algorithms to the texture mapping hardware and efficiently utilizing GPU caches. This mapping exploits the inherent parallelism, pipelining and high memory bandwidth on GPUs. We further improve the performance of numerical a...
متن کاملBenchmarking the Memory Hierarchy of Modern GPUs
Memory access efficiency is a key factor for fully exploiting the computational power of Graphics Processing Units (GPUs). However, many details of the GPU memory hierarchy are not released by the vendors. We propose a novel fine-grained benchmarking approach and apply it on two popular GPUs, namely Fermi and Kepler, to expose the previously unknown characteristics of their memory hierarchies. ...
متن کاملA Cache-Efficient Sorting Algorithm for Database and Data Mining Computations using Graphics Processors
We present a fast sorting algorithm using graphics processors (GPUs) that adapts well to database and data mining applications. Our algorithm uses texture mapping and blending functionalities of GPUs to implement an efficient bitonic sorting network. We take into account the communication bandwidth overhead to the video memory on the GPUs and reduce the memory bandwidth requirements. We also pr...
متن کاملParallel Algorithm of IDCT with GPUs and CUDA for Large-scale Video Quality of 3G
When video is transmitted over 3G networks, the video quality might suffer from impairments caused by packet losses. Extracting video quality features is a set of algorithms and inverse discrete cosine transforms is an important algorithm in this field. To improve the performance and be suitable to apply to evaluating the 3G video quality in real-time, two different parallel algorithms with CUD...
متن کامل